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USING  INFLUENCE  DIAGRAMS 
TO  SOLVE  THE  CALIBRATION  PROBLEM 


1.  INTRODUCTION 

A  measuring  instrument  measures  a  unit  and  records  an  observation  y. 
The  "true”  measurement,  x,  of  the  unit  is  to  be  inferred  from  y.  If  p(y|x) 
is  the  likelihood  of  y  given  x  and  x  has  prior  p(x),  then  by  Bayes'  Theorem 

p(x|y)  q  p(y |x)p(x) . 

2 

Let  xq  and  ctq  be  the  mean  and  variance  of  p(x) .  We  will  assess  the 
likelihood,  p(y|x),  using  a  linear  regression  model 

y  =  a  +  P(x  -  x*)  +  €  (1.1) 

K  M  9 

where  x  is  specified  and  a  priori  (a,P)  1  x  i  €  and  €  given  x  is  N(0,a  ) 
with  a  specified.  (These  assumptions  could  of  course  be  relaxed;  e.g.  o ^ 
unknown,  €  dependent  on  x,  etc.  However  our  assumptions  are  convenient  and 
sufficiently  general  to  provide  conclusions  of  general  interest.) 

The  ’’center",  x  .  of  the  likelihood  model  and  the  prior  for  x  are 
intertwined.  The  natural  choice  for  x  is  the  mean  of  the  prior  for  x. 
namely  x  =  x  .  This  is  reasonable  since  our  attention  is  focused  on 


calculating  p(x|y).  The  line,  with  x  =  xq,  is  y  =  a  +  fJ(x  -  xq)  where  a  a- 
P  are  unknown  and  of  course  y  cannot  be  observed  without  error.  See  Figure 
1.1.  Of  course  the  prior  for  (a./3)  depends  on  x  =  XQ  and  we  may  write 
p(a./3)  =  p(a|/3,Xo)  p(P)  since,  in  general,  only  a  depends  on  xq. 

Figure  1.2  is  an  influence  diagram  describing  the  logical  and 
statistical  dependencies  between  unknown  quantities,  decision  alternatives 
and  values  (losses  or  utilities).  The  decision  may  be  an  estimate  for  x 
given  y.  If  the  worth  or  loss  is 

w(d,x)  =  (d  -  x)^ 

then  the  optimal  decision  will  be  the  posterior  mean  for  x  given  y. 

The  Calibration  Experiment  !  ' 

The  purpose  of  the  calibration  experiment  is  to  learn  about  (^t.  P)  so 
that  given  a  future  observation  y  we  can  reduce  our  uncertainty  about  a 
future  "true''  measurement  x.  To  calibrate  our  measuring  instrument,  we 
record  n  measurements  *  ,  '  1 

,  y  =  . yi,> 

on  n  units  all  of  whose  "true"  measurements, 

X  =  <X1'X2 . Xn} 

are  specified  beforehand.  Based  on  our  prior,  p(x) ,  for  a  future  x  (call  . 
Xj.)  and  our  regression  model  (1.1),  our  problem  is  to  determine 
x  =  (Xj.Xg . x^)  (subject  to  feasibility  constraints)  so  as  to  minimize 


some  overall  loss  function,  x  is  called  the  experimental  design  for  the 
calibration  experiment. 


TRIM'  Mr.ASt'RFMKNT 


The  following  assumptions  will  be  made  relative  to  the  calibration 
experiment . 

Assumption  1.  The  future  "true  value",  x^. .  is  independent  of  (a.P).  x  ant 
y.  The  future  observation,  y^.,  is  independent  of  (x.y)  given  (a.P). 
Assumption  2.  The  worth  function  w(d,x^.)  is  a  loss  function  and  depends 
only  on  d  (the  decision  regarding  x^.  taken  at  the  time  we  observe  y^.)  and 
the  "true  value"  x For  example,  we  are  ignoring  the  cost  of  per form inn 
the  experiment. 

Assumption  3.  The  feasible  region,  R,  for  the  experimental  design,  x,  is 
bounded.  That  is.  infinite  x^  values  are  not  allowed  in  practice.  We  seel 
an  optimal  experimental  design  subject  to  x  €  R. 

Figure  1.3  is  an  influence  diagram  describing  the  logical  and 
statistical  dependencies  between  the  unknown  quantities  and  decision 
variables  in  our  problem.  Figure  1.4  shows  the  influence  diagram  after 
(a.P)  have  been  eliminated  by  computing  the  posterior  distribution, 
p(a,p|x,y)  and  then  calculating 

P(yfl  x.y,xf)  =  //  p(yf |  a.p,xf)  p(a,p|  x.y)  da  d p. 

[Influence  diagram  operations  are  discussed  in  Shachter  (19SG)  and  Barlow 
(1987).] 
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If  we  take  squared  error  loss,  (d  -  x^.)  .  as  our  worth  function  whet 
Xj.  is  the  future  "true"  measurement  of  a  unit,  then  d  is  our  estimate  of  > 
after  we  observe  a  future  y^..  Note  that  at  the  time  of  decison  when  we 
estimate  x^.,  we  know  x,  y  and  y^..  Since  we  do  not  know  x^.  at  this  time. 
p(xf ly^ >y >x)  must  be  found  via  Bayes’  Theorem  and 


/  (d  -  xf )  p(xf  |vf ,y.x)  dxf 


calculated.  The  minimizer  is  the  posterior  mean 


d  =  E(xf  |yf .y.x) 


so  that  after  observing  y^..  our  worth  function  is 

Var(xf |yf ,y.x) . 

At  the  design  stage,  we  of  course  do  not  know  y^.  or  the  test  result1-  y 
Hence,  using  the  method  of  Bayesian  decision  analysis  we  must  minimize 


E  i  E 


Min  E 


[w(d,xf ) |yf .y.x]  =  W(x) 


y |x  yf |y.x  xf lyf .y.x 


with  respect  to  x  =  (Xj.Xg . x^).  W(x)  is  the  final  expected  worth 

function  with  respect  to  the  experimental  design  x. 

For  a  more  detailed  discussion  of  this  problem  and  references  to  other 
approaches  see  Chapter  10  of  Aitchison  and  Dunsmore  (19S0).  Hoadley  (1970) 
discusses  the  calibration  inference  problem  in  some  detail  and  points  out 
the  difficulties  with  the  maximum  likelihood  estimator  for  x^.  given  an 

observation  y^.  and  data  {(x^.y^,  i  =  1,2 . n}  from  a  calibration 

experiment.  Brown  (1982)  and  Brown  and  Sundberg  (1985)  extend  Hoadley 's 
results  using  a  multivariate  formulation.  However  they  do  not  consider  the 
problem  of  optimal  Bayesian  experimental  design.  The  definitive  reference 
for  Bayesian  design  for  linear  regression  is  Chaloner  (19S-1).  The  object i\i 
of  this  paper  is  to  discuss  the  calibration  experimental  design  problem  and 
results  for  special  cases. 


Summary  of  Results 

Based  on  the  likelihood  it  is  shown  that  the  experimental  design  may  h 

__  n  2  n  2 

summarized  by  n,  x  -  x  =  2(x.  -  x  )/n  and  v  =  2(x.  -  x  )  /n  where 

O  1  O  X  1  o 

1  1 

x  -  X  I  <  V  . 
o 1  X 

If  P  is  known,  W(x)  depends  only  on  n  and  the  optimal  design  corresponds  to 
taking  n  as  large  as  possible.  The  values  of  x  are  immaterial.  If  a  is 

n  2 

known,  W(x),  depends  only  on  2{x^  -  xq)  and  is  decreasing  in  for  fixed 
n. 

If  both  a  and  p  are  unknown,  the  optimal  design  can  be  found  by 

performing  a  three  dimensional  search  over  (n,x,v^).  W(x)  can  be  evaluated 

numerically  by  using  three  nested  subroutines  when  the  prior  for  ( a.P )  is 

2  2 

bivariate  normal  and  w(d,Xj.)  =  (d  -  x^.)  .  For  this  case  and  x^.  N(xQ,ao  ). 
we  can  explicitly  calculate  W(x|o^  =  0).  Also  for  this  case,  W(x|xj  =  xv  = 
...  =  x^  =  xq)  can  be  numerically  calculated  using  two  nested  subroutines. 


2.  WORTH  OF  INFORMATION  GAINED 

Suppose  we  perform  the  calibration  experiment  x.  Then 

Min  E  [w(d.x  )]  -  W(x) 
d  xf  1 


(2.1) 


is  a  measure  of  the  expected  reduction  in  our  uncertainty  about  x^.  (when 
2 

w(d,Xj.)  =  (d  -  Xj.)  )  as  a  result  of  performing  the  calibration  experiment 


Lemma  2.1  shows  that  this  difference  is  2  0.  This  is  the  familiar  expected 
information  inequality  in  our  notation  [Raiffa  and  Schlaifer  (1961)].  It 
gives  us  easily  computed  upper  bounds  on  W(x)  as  a  result  of  performing  the 


Ill 


calibration  experiment.  This  is  useful  in  checking  computer  calculat ions . 
From  Figure  1.3  we  see  that  at  the  time  of  decision  (e.g.  estimating  x^.)  v 
know  x,  y,  and  y^..  It  is  intuitively  clear  that  when  w(d.x^)  is  a  loss 


function  the  final  expected  value  will  be  greater  the  less  information  we 
have  at  the  time  of  decision. 

The  following  results  stated  as  lemmas  will  be  used  in  the  next 
section. 

Lemma  2.1.  If  the  range  of  possible  decisions,  d,  does  not  depend  on  x  oi 


then 


Ey|x  Eyf |y.x  "‘n  E*f|yt.y.*[’(d-Xf)lyr»-’0 


<  E 


yf  Exf|yfWd'*f>M 


<  Min  E  [w{d.xf)] 
d  xf 


Proof .  We  will  prove  the  first  inequality.  Let 


‘n  E*f|yf[w(d'*f)l)'f]  =  Exf|yf(*l<io(yf>-’‘f:ilyf> 


Min 

d 


so  that 


Ey|x  Eyf  ly.x  M‘n  Exf  |yf  .y  .x(Wd'xf  >  "  «lld0(y, ).«,]]  |yf  .y .«}  <  0. 


We  need  only  show 

Ei  E  |  E  |  {w[d(yr),xr]|vr.y.x) 

y|x  yf|y.x  xf|yf.y.xl  L  owf'  fJl  f  J  1 


=  E 


yf  Exf  |yf([“[do(yf,'xf^yf11' 


From  Bayes’  theorem  and  the  fact  that  x^.  is  independent  of  (x.y)  we  have 
p(xf ,yf |x,y)p(y|x)  =  [p(y |xf ,yf ,x)p(xf ,yf )/p(y |x)]p(y |x) 

=  p(y|yf .x)p(xf ,yf). 

The  result  follows  by  an  interchange  in  the  order  of  integration. 

The  second  inequality  follows  in  a  similar  way.  QED 

Remark .  E  Min  E  i  [w(d,x.)  |yf]  corresponds  to  not  performing  the 

yf  d  xf I y f 

2 

calibration  experiment  (i.e.,  n  =  0)  .  When  w(d,x^.)  =  (d  -  x^.)  the  above 
inequalities  become 

Ey|x  Eyf  |y.x  Var(xf  lyf -y-x)  ^  Ey^  Var(xf|yf)  <  Var(xf)  . 

It  follows  from  Lemma  2.1  that  the  expected  worth  function  can  only 
decrease  if  we  perform  additional  calibration  experiments.  We  use  this  fact 
later.  (This  would  not  be  true  if  w(*,*)  depended  on  (x.y).) 

Lemma  2.2.  Under  the  assumptions  of  Lemma  2.1  and  w(d.x^)  a  loss  functio:. 

"(xl . xn>  2  *(xl . Vx„*l> 

where  the  first  n  coordinates  are  the  same  on  both  sides  of  the  inequality. 

3.  LIKELIHOOD  AND  THE  OPTIMAL  EXPERIMENTAL  DESIGN 

Under  the  assumption  that  observation  errors,  (€^ |  i  =  1,2 . n)  arc 

2 

independent  N(0,o  ),  but  without  specifying  prior  distributions,  we  can 
determine  some  of  the  structure  of  the  optimal  experimental  design.  This 
cam  be  done  using  the  sufficient  statistics  for  (a,/3)  corresponding  to  our 
likelihood  model.  As  noted  before,  the  purpose  of  the  calibration 
experiment  is  to  learn  about  (a.P).  The  likelihood  for  (a,P)  given  the  dm 


12 


m 


VV.7V7;  J'-V.v  V. 


2  2 


L(a,0|Data,xo)  a  exp{-I[y.  -  a  -  0(x.  -  xq)]  /2a  }. 


A  priori  assume  a  1  P  1  £  and  let  E(a)  =  a,  E(/3)  =  b,  Var(a)  =  a  ,  and 

3 


Var(/3)  =  .  Define 


e .  =  y .  -  a  -  b(x.  -  x  ) 
11  l  o 


and  rewrite 


y.  -  a  -  P(x.-xo)  =  [yj  -  a  -  b(x.-xQ)]  -  (a-a)  -  (p-b)(x.-xo) 


=  e.  -  (a  -  a)  -  (0  -  b)(x.  -  xq) 


so  that 


L(a,p |Data,xo) 


a  exp{-[n(a  -  a)2  +  (P  -  b)2  I(x  -  x  )2  -25^  [(a  -  a) 

i  i 


(P  ~  b)(x.  -  xoJ]  +  2(a  -  a)(p  -  b)  2(x.  -  xo)]/2a*}. 


Clearly  n.  I(x.  -  x  ),  2(x.  -  x  )  ,  z  =  le.  and  z  =  2e.(x.  -  x  )  are 
l1  °  1  1  °  1  l  1  ^  1  1  1  ° 


sufficient  statistics  for  ( a.p )  since  xo>  a,  b  and  o  are  specified.  It 


follows  that  the  posterior  density  for  ( a,p )  also  depends  on  the  data  only 


through  n,  2(x..  -  xq),  2(x^  -  xq)  ,  z^,  and  z 


Theorem  3.1.  W(x)  depends  on  x  only  through  n,  x  -  x  =  2(x^  -  XQ)/n  and 

i 


2  u  2 
v  =  2(x.  -  x  )  /n. 
x  j  i  o' 


N.  B.  This  is  true  for  all  worth  functions  w(d,x^.)  and  priors  on  (a.p)  and 


Xj..  The  worth  function  can  also  depend  on  n,  x  -  xq  and  in  this  case 


/  .*  /  /  /  / 


l.'V  v,  r  «■/  V,  i.1  V7  t.^r 


\_v  i|  »  jwj  >..|.»",T,i  r7  >'.  u  ;  ti 


Proof .  The  purpose  of  the  calibration  experiment  is  to  learn  about  (a.fi). 

Since  n,  x  -  Xo<  v^,  z^  and  z ^  ate  sufficient  statistics  for  ( a.p ),  the  u  *- 

results,  y,  may  be  summarized  by  z^  and  Z2-  Hence  from  (1.2)  we  need  only 

show  that  the  joint  distribution  of  (z^.Zg)  depends  on  x  only  through  n, 

x  -  x  and  v  . 
o  x 

It  is  easy  to  show  that  (z^.Zg)  given  (a.P)  is  bivariate  normal  where 
z^  given  (a.P)  is 

n  2 

N[n(a  -  a)  +  (P  -  b)  2(x.  -  x  ) .  no  ] 

1  0 

and  Z2  given  ( a.p )  is 

N[(o  -  a)  2(Xj  -  XQ)  +  (P  -  b)  2(x.  -  xo)2.  a2  2(x.  -  xQ)2] 

whi  le 


Corollary  3.2. 
through  n .  The 
well  take 


CovfZj .  z2|a,P)  = 

If  Pis  known ,  i . e . 

"levels"  (x  x . x  ) 

v  1  2  n ' 


a  5(x.  -  x  ).  QED 

j  1  o’ 

=  0,  then  W(x)  depends  on  x  only 
are  immaterial  and  we  might  just  as 


x,  =  x0  =  ...  =  x  =  x 
12  no 


or  any  other  values  that  we  like. 

Proof .  If  we  are  certain  that  P  =  b;  i.e.  o^  =  0,  then  (3.2)  becomes 

L(a|Data,x  )  a  exp(-[n(a  -  a)  -  22e . (a  -  a)]/2a  }. 
o  1  1 

n  n 

Hence  n  and  z,  =  2e .  =  2[y.  -  a  -  b(x.  -  x  )]  are  sufficient  for  a. 

1  1  1  j  i  v  1  o/J 


Since  z^  given  (a,p=b)  is 


N[n(a  -  a) .  no  ] 


Q1 


it  follows  that  W(x)  depends  on  x  only  through  n. 


Corollary  3.3.  If  a  is  known,  i.e.  a  =0,  then  W(x)  depends  on  x  only 
n  2 

through  2(x.  -  x  )  .  Furthermore,  for  fixed  n.  W(x)  is  decreasing  in  v 
i 

In  this  case,  W(x)  is  minimized  for  those  x  belonging  to  R  for  which  v 
maximum. 

Proof .  If  =  0,  then  (3.2)  becomes 

L(p|Data,  xQ)  a  exp{-[(p-b)2  I(x.-xo)2  -2(j3-b)2e . (x ,-xo)]/2o2} . 

n  2  n 

Hence  2(x.  -  x  )  and  z9  =  2e.(x.  -  x  )  are  sufficient  for  0.  Since  z 

jlO  £.  j  1  l  o  2. 

given  (a  =  a,  0)  is 

N[(0  -  b)2(x.  -  xQ)2,  o2  2(x.  -  xo)2] 

it  follows  that  when  a  =  a  is  known,  W(x)  depends  on  x  only  through 

n  2 

2(x.  -  x  )  . 
j v  l  o' 

n  2  n  2 

Suppose  2(x.  -  x  )  <  2(x.'  -  x  )  .  Clearly  we  can  find  x  ,  such 

j  i  o'  jVi  o'  n+1 

n  2  n  2  2 
2(x.  -  x  )  =  2(x.  -x)  +  (x  .  -  x  ) 

j  l  o'  j  l  o'  v  n+1  o' 

n+1  9 

=  2{x .  -  x  )  . 

1  l  o' 

From  Lemma  2.2  in  section  2  we  have 


W(x. . x  )  >  W(x  . x  ,x  )  . 

v  1  n  1  n  n+1' 


Hence  W(x)  is  decreasing  in  2(x.  -  xq)  for  fixed  n. 


Determining:  the  Structure  of  the  Optimal  Experimental  Desi 


z(x.  -  x)  /n  2  0 


it  follows  that 

n  —  2  n  2—2 
2(x  -  x  +x  -  x)  /n  =  2(x.  -x)/n-(x.-x)  >0 

j  i  o  o  ’  j  i  o’  '  i  o’  ~ 

and  lx  -  x  I  <  v  . 

1  o  x 

Consequently,  the  minimization  problem  with  respect  to  x  car!  be  transformed 
to  a  minimization  problem  with  respect  to  only  three  variables,  namely  n  arc 

lx  -  x  I  <  v  . 

1  o  ~  X 

Since  x  -  Xq  and  are  symmetric  functions  of  an  experimental  design  x 

it  follows  that,  for  fixed  n.  any  permutation  of  the  coordinates  of  an 

experimental  design  solution  is  also  a  solution  (if  allowed  by  the 

feasibility  constraints).  Figure  3.1  shows  the  nature  of  the  possible 

(x, ,x0)  solutions  for  v  fixed  and  n  =  2.  The  darkened  arcs  on  the 
12  x 

circumference  show  the  possible  designs  for  a  fixed  (up  to  permutations 

of  coordinates).  For  fixed  v  ,  possible  solutions  are  traced  out  by  the 

intersection  of  the  line  x  -  x  =  c  with  the  circumference  of  the  circle 

o 

2  2  2 

x,  +  x0  =  v  as  c  varies  from  -v  to  v  . 

1  2  x  xx 

The  optimal  experimental  design,  x,  can.  in  theory,  be  found  through  a 
three  dimensional  search  over  the  feasible  region  R.  One  strategy  would  be 
to  f ix  n  and.  using  a  computer  calculate  a  three  dimensional  plot  of 

Wlx)  '  Ey|*  Eyr  ly.x  "1"  Ex(  |y,  ,y.xf*(d-xf  >  l!,f  *■*] 


versus  x  -  x  and  v  .  Figure  3.2  illustrates  the  3  dimensional  plot  for  a 
ox 

fixed  n.  The  plot  shows  the  surface  of  W(x)  as  a  function  of 

lx  -  x  |  <  v  . 

1  o 1  -  x 


The  Case  x,  = 


!  =X2  * 


=  x  =  x 
n  o 


Suppose  we  are  uncertain  about  both  a  and  p.  From  (3.2)  we  see  that  i: 


x,  =  x„  =  ...  =  x  =  x  .  then 
12  no 


n 


L(a,p|  Data)  a  exp{-[n(a  -  a)^  -  2Ie.(a  -  a)]/2 

1  1 

so  that  in  this  case  the  data  provide  no  direct  information  about  p.  If  in 
addition,  the  prior  for  (a, P)  satisfies 

p(a.p|xo)  a  p(a|xQ)  p(P) 
i.e.  a  and  p  are  a  priori  independent  given  xo>  then 

p(a.p|Data.xo)  a  L(a|Data,xo)  p(a|xQ)  p(P) 
and  the  posterior  marginal  for  P  is  the  same  as  the  prior  marginal  for  p. 
Intuitively,  if  P  is  unknown,  the  experimental  design 


Xj  =  x2  - 


=  X  =  X 

n  o 


is  a  local  maximum  for  the  final  expected  value  since  values  of  near 
will  provide  information  about  P  and  hence  tend  to  reduce  the  final  expected 
value . 


The  Case 


w(d,xf)  =  (d  -  xfr 


In  this  case 


=  E 


"(x)  '  Ey|x  Eyf|y.xVar(xf|yry',‘> 
yjx  Eyf|y.x  EXj.|y,x^xf  lyf'y,x)  Ey  |x  Eyf ly .x^Exf |y , x*xf ^ f 'x^ 


Since  x^.  is  independent  of  (x,y).  we  can  explicitly  evaluate  the  first  tc: 


*(x>  =  °o  *  \  -  Ey|x  Eyf|y.x<Exf|y.x'xf'yf 


(xJy,.y.x)}  . 


so  that 


BIVARIATE  NORMAL  PRIOR  FOR  (a 


To  calculate  W(x)  for  a  particular  experimental  design  we  need  to 

assess  a  prior  distribution  for  (a,/ 3).  Suppose  a  1  P  1  e  and  a  has  a 

2  2 

N(a ,a  )  distribution  while  P  has  a  N(b,o^  )  distribution  a  priori.  Tabl< 

4.1  gives  the  posterior  bivariate  normal  parameters  given  the  suff icier, t 
_  n  n 

statistics  n,  x  -  x  ,  v  ,  z,  =  2e .  and  =  2e.(x.  -  x  ).  Note  that  a 

o  x  1  i  i  2  j  iv  l  o'  a 

2 

.  and  p  _  do  not  depend  on  the  observations,  y,  from  the  calibratio: 
P  a.p 

experiment.  The  derivation  of  the  posterior  parameters  in  Table  4.1  is 
given  in  the  appendix. 

Our  objective  is  to  calculate  W(x)  for  a  given  experimental  design  x 
However,  this  is  in  general  exceedingly  difficult  numerically.  Hence  we  ; 


also  interested  in  bounds  and  efficient  computational  methods  for  special 


cases . 


f'  -r-  r :t: 

■V' 


(2e.)[I(x.-xo)2  +  a2/0fa2]  -  [2(x.-xo)][2c. (x ,-xo)] 
a  (n +a2/aa2)[2(x.-xo)2  +  02/0b2]  -  [2(x.-xo)]? 


(n+CT2/aa2)[2e.(x.-Xo)]  -  [2(x . -xo)](2e. ) 

Mo  =  b  +  9  9  9  9  o  9 

(n+o/aa)[2(x.-Xo)  +  0  /o^  ]  -  [2(x.-xo)] 


2rT/  ,2  ^  2.  2, 

a  [2(x.-xo)  +  a  /ob  ] 

(n+02/0a2)[2(x.-xo)2  +  o2/0b2]  -  [2(x.-xo)]2 


2,  ^  2.  2, 

o  (n  +  o  /cj  ) 

_ _ _ a  _ 

(n-t-02/0a2)[2(x.-xo)2  +  tf2/ob2]  "  [2(x.-xo)]2 


_ "2^i-Xo> _ 

(n+02/0a2)[2(x.-xo)2  +  °2 


co v(a,P)  = 


-o  I(x.-xo) 

(n+o2/0a2)[2(x.-xo)2  +  P2/C7b2]  -  [2(x.-xo)]2 


where 


e.  =y.  -  a  -  b(x .  -x  ) 
11  1  o 


TABLE  4.1.  Parameters  of  the  Posterior  Distribution  of  (a./) 

Given  x  and  y 


From  the  influence  diagram,  Figure  1.3,  we  see  that  at  the  time  of 

decision,  a  and  0  are  unknown.  Hence  we  must  first  calculate  the  poster; 

distribution  of  a  and  0  given  n,  x  -  x  .  v  ,  z,  and  z_  The  distributee:. 

oxl  2 

y^.  given  x^..  y  and  x  is  then 

2 

N[mq  +  -  *Q)-  s  (xf)] 

where 

2  2  2  2  2 
s  (xf)  =  a  +  oq  +  0^  {xf  -  xq)  + 

+  2Cov  ( a  ,  0 )  ( x  j.  -  xq  ) .  (  i 

Using  Bayes'  theorem 

p(xf|yf.y.x)  a  p(yf lxf .y.x)  p(xf ) 

2  2 

a  exp { —  [ y j.  -  Mq  -  Mp(xf  -  xq)]  /2s  (xf))  p(xf).  (4. 

Subtract  p  from  yr  and  let 
a  J  f 


rf  =  yf  -  V 


From  (4.2),  it  is  clear  that 

xf  1  (yf  -y)  I  wf .  Pp 


i.e.  Wj.  and  pp  are  sufficient  for  x^.  with  respect  to  (y^.y)  where  for 
convenience  ...  stands  for  all  parameters  which  depend  only  on  n,  x  and  v 


Since  we  consider  n,  x  and  v  fixed  and  known  in  this  section,  we  will  o^ 

x 


2  2 

these  parameters  in  our  conditioning  statements.  Also  a  ,  o„  ,  and  n  . 

a  0  a ,  / 


depend  only  on  n,  x.  v^  and  our  bivariate  normal  prior  parameter  values. 


Hence  these  will  also  be  omitted  henceforth  in  our  conditioning  stateme:.: 
Based  on  sufficiency  considerations  for  bivariate  normal  priors,  tin 


influence  diagram  in  Figure  1.3  can  be  redrawn  as  in  Figure  4.1.  Note  ?! 


whenever  we  needed  to  use  Bayes’  theorem  (to  achieve  arrow  reversals)  it 


also  helpful  at  that  point  to  employ  sufficiency  considerations  to  reduce 
the  parameter  space. 


Calculation  of  Wfx) 

(4.2)  is  the  crux  of  our  numerical  difficulties  since  p(x^.  |w^.  ,p^)  is 

2 

not  normal  even  when  x^.  is  N(xQ,ao  ).  Figures  4.2  and  4.3  are  plots  of 
E(Xj|yj.)  and  Varfx^ly^)  versus  y^.  when  the  calibration  experiment  is  not 
performed  (i.e.  n  =  0).  Were  x^.  and  y^  jointly  bivariate  normal.  Var(Xj.|yj  ) 
would  not  depend  on  y^.  as  it  obviously  does  in  Figure  4.3. 

Using  Figure  4.1  we  see  that 

*<x>  =  \  Simp  Mdn  Siv'p  v- 

2 

When  w(d,Xj.)  =  (d  -  x^)  we  have 

W(x)  =  E  E  |  Var(xr|wr,p  ). 

Mp  Wf|p^  v  f1  f  HP' 

We  can  thus  numerically  calculate  W(x)  using  three  nested  subroutines  for 
each  x.  The  computational  running  time  will  be  proportional  to  the  product 
of  the  number  of  points  used  in  each  subroutine. 

The  Distribution  of  w^|x^.,  Pp 

To  calculate  the  posterior  distribution  of  x^.  given  w^.  and  p^  we  need 

first  to  calculate  the  distribution  of  w^.  given  x^.  and  p^. 

2  2 

Theorem  4.1.  p(wj.|x^.,Pp  )  is  N[p^(x^.  -  xq),  s  (x^.)]  where  s  (x^.)  is  give:, 
by  (4.1). 

Proof.  Clearlv  Efw.lx„.u„1  =  E  Efw„lx„.u  =  u„fx„  -  x  ).  Since  x_  i  >- 

-  L  f'  f  PJ  pQ  L  f'  f  a  pJ  Py  to  f 

independent  of  ( a,p )  and  y  and  (z^.Zg)  only  appear  in  pQ  and  p^ 


Var(w^. 

'Xf  V  = 

(wf |xf , 

■  V  V,xr 

MP] 

:(wf  |xf, 

■  *V  v|xr 

MP] 

From  (4.1)  we  see  that  the  first  term  is  the  same  as  s  (x^.)  which  is 
constant  in  (z^.z^)  while  the  second  term  is  0. 

5.  NUMERICAL  CALCULATION  OF  WIx)  WHEN  (a.B)  IS  BIVARIATE  NORMAL 


The  Case  When  is  Known  and  w(d,x<.)  =  (d  -  xf ) 

As  we  noted  in  Section  3.  when  =  0  a  priori,  Var(x^.  |w^.  ,p^)  depends 

on  the  experimental  design  only  through  n. 

2  2 

Theorem  5.1.  If  =  0,  x^.  is  N(XQ.  °0  )  w(d,x^.)  =  (d  -  x^.)  then 

2  9  2  2  -1 

W(x)  =  {  b*/[o*  +  a/]  +  l/o* ) 

2  2  2  -1 
where  a  *  =  (u/a  +  1/a  z)  . 
a  v  el  ' 

Proof .  Since  p(w^.  |x^.  ,a,p  =  b)  is  N[b(x^.  -  xq),  o  ]  the  predictive  dens  ity 
for  wf.  p(wf |xf,  =  b).  is 

N[b(xf  -  xq).  a2+aQ2]. 

2 

If  p(Xj.)  is  N(xq,  aQ  )  a  priori,  then  by  Bayes’  theorem 

p(xf |wf)  a  p(wf  |xf)  p(xf) 

a  exp{-[wf  -  Pp(xf  -  xQ)]2/2(a2  +  aQ2)}  exp[-(xf  -  xo)2/2oo2]. 
Collecting  terms  in  the  exponents  we  find 


whi  le 


Var(xf|wf)  =  [  b2/(a2  +  oq2)  +  l/o^]"1 


E(xf|wf)  =  {[b2/(a2+aa2)][wf/b]  +  ^/a^}  . 

2  2  2  2 
[b2/(a2+aa2)  +  l/aoZ] 


Since  Var(Xj.|w^.)  does  not  depend  on  w^.  in  this  case, 

W(x)  =  Var (x. |w_ ) . 


The  Case  When  x 


x  and  w(d,xr)  =  (d 


In  this  case  we  can  numerically  calculate  W(x)  using  two  nested 
subroutines.  Because  of  the  comparative  ease  of  computation,  this  is  almos; 
as  good  as  a  closed  form  solution. 

As  we  saw  in  Section  3,  this  choice  of  x  will  provide  no  information 
about  p.  Hence  p^  =  b  and 

W(x)  =  E^Var(xf  |wf)  . 

Thus  only  two  nested  subroutines  are  required.  In  this  case  w^.  given  x^.  and 


Up  =  b  is  N[b(xf  -  xq).  s  (xf)] 


where 


2,  .  2  2  2.  ,2 

s  (xf)  =  a  +  aa  +  crb  (xf  -  xq) 


and  a  ^  =  (n /(?  +  1/a  * . 

a  a  * 


The  General  Case 


To  numerically  calculate  W(x)  using  three  nested  subroutines  we  need 

the  density  of  p^.  From  Table  4.1  we  see  that  p^  is  a  linear  combination  oi 

Zj  and  z^.  Since  z^  and  z^  are  unconditionally  bivariate  normal  it  follows 

2  2 

that  p„  is  N(b,a  )  where  a  depends  on  the  covariance  matrix  of  (z,,z,) 

P  v  Pp  MP  v  1  2' 

2 

It  is  easy  to  verify  that  z^  is  N(0,a^  )  where 

2  2  2  x  "  ,-,2  2  2 
CT1  =  n  aa  C2(xi  “  Xo^  CTb  1X7 

2 

while  Zg  is  N(0,O2  )  where 

2  rv/  n2  2  ^  "  ,212  2  2^  ,2 

Oo  =  [2(x.  -  x  JJ  j  +  [2(x.  -  x  )  1  a,  +  a  2(x,  -  x  ) 
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APPENDIX 


DERIVATION  OF  THE  POSTERIOR  PARAMETERS  IN  TABLE  4.1 


Suppose  x  and  y  have  joint  density 

2  2 

p(x,y)  a  exp[-(ax  +  bx  +  cy  +  dy  +  exy)/2] 
where  a,  b,  c,  d  and  e  are  constants.  Then  it  follows  that  the  p:-. ir  (x, 
has  a  bivariate  normal  distribution;  i.e.. 

p(x.y) 

exP{“[ (x~P J2/ o  2-2p  „(x-p  ) (y-M„)/o  o„  +  (y-MA)2/oR2]/2(  1-p2) } 


2v  °a°P  4il~PaP  > 

By  matching  coefficients  in  corresponding  terms  in  the  exponents  p 

Op.  and  p^p  can  be  expressed  in  terms  of  a.  b,  c,  d.  and  e. 

2  2  2 

The  coefficent  of  x  is  a  =  1  /[cr^  (1  -  p ^  )]. 

2  2  2 

The  coefficent  of  y  is  c  =  1  /[crp  (1  -  p^p  )]. 

2 

The  coefficient  of  xy  is  e  =  -2p/[(l  -  p ^  )oQOp] . 

2  2  ^ 

Now  4p  =  e  /ac  implies  p  =  -e/sJacT  and 

oq2  =  l/a(l  -  Pa^)  =  4c/[4ac  -  e2] 

°P  ~  1/c( 1  "  ^2)  =  -  e2] 

From  the  coefficient  of  x  we  have 

b  =  [-2pQ/aa2  +  2 pQp  Mp/aaOp]/(l  -  PQp2) 

while  from  the  coefficient  of  y  we  have 

d  =  [-2Pp/Op2  ♦  2 PQp  Ma/ooap]/(l  -  pQp2) 


